98 research outputs found

    Mining Host-Pathogen Interactions

    Get PDF

    An Exploratory Study of Social Media Analysis for Rare Diseases using Machine Learning Algorithms: A case study of Trigeminal Neuralgia

    Get PDF
    Rare diseases, affecting approximately 30 million Americans, are often poorly understood by clinicians due to lack of familiarity with the disease and proper research. Patients with rare diseases are often unfavorably treated, especially those with extremely painful chronic orofacial rare disorders. In the absence of structured knowledge, such patients often choose social media to seek help from peers within patient-oriented social media communities thereby generating tremendous amounts of unstructured data daily. We investigate whether we can organize this unstructured data using machine learning to help members of rare communities find relevant information more efficiently in real-time. We chose Trigeminal Neuralgia (TN), an extremely painful rare disorder, as our case study and collected 20,000 social media TN posts. We categorized TN posts into Twitter (very short), and Facebook (short, medium, long) datasets based on message length and performed three clustering experiments. Results revealed GSDMM outperformed both K-means and Spherical K-means in clustering Facebook especially for short messages in terms of speed. For long messages, MDS reduction outperformed the PCA when both were used with K-means and Spherical K-means. Our study demonstrated the need for further topic modeling to utilize among high level clusters based on semantic analysis of posts within each cluster

    Accelerating large-scale protein structure alignments with graphics processing units

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Large-scale protein structure alignment, an indispensable tool to structural bioinformatics, poses a tremendous challenge on computational resources. To ensure structure alignment accuracy and efficiency, efforts have been made to parallelize traditional alignment algorithms in grid environments. However, these solutions are costly and of limited accessibility. Others trade alignment quality for speedup by using high-level characteristics of structure fragments for structure comparisons.</p> <p>Findings</p> <p>We present <it>ppsAlign</it>, a parallel protein structure Alignment framework designed and optimized to exploit the parallelism of Graphics Processing Units (GPUs). As a general-purpose GPU platform, <it>ppsAlign </it>could take many concurrent methods, such as TM-align and Fr-TM-align, into the parallelized algorithm design. We evaluated <it>ppsAlign </it>on an NVIDIA Tesla C2050 GPU card, and compared it with existing software solutions running on an AMD dual-core CPU. We observed a 36-fold speedup over TM-align, a 65-fold speedup over Fr-TM-align, and a 40-fold speedup over MAMMOTH.</p> <p>Conclusions</p> <p><it>ppsAlign </it>is a high-performance protein structure alignment tool designed to tackle the computational complexity issues from protein structural data. The solution presented in this paper allows large-scale structure comparisons to be performed using massive parallel computing power of GPU.</p

    Finding shortest and nearly shortest path nodes in large substantially incomplete networks

    Full text link
    Dynamic processes on networks, be it information transfer in the Internet, contagious spreading in a social network, or neural signaling, take place along shortest or nearly shortest paths. Unfortunately, our maps of most large networks are substantially incomplete due to either the highly dynamic nature of networks, or high cost of network measurements, or both, rendering traditional path finding methods inefficient. We find that shortest paths in large real networks, such as the network of protein-protein interactions (PPI) and the Internet at the autonomous system (AS) level, are not random but are organized according to latent-geometric rules. If nodes of these networks are mapped to points in latent hyperbolic spaces, shortest paths in them align along geodesic curves connecting endpoint nodes. We find that this alignment is sufficiently strong to allow for the identification of shortest path nodes even in the case of substantially incomplete networks. We demonstrate the utility of latent-geometric path-finding in problems of cellular pathway reconstruction and communication security

    Phytonematode Peptide Effectors Exploit a Host Post‐Translational Trafficking Mechanism to the ER using a Novel Translocation Signal

    Get PDF
    Summary Cyst nematodes induce a multicellular feeding site within roots called a syncytium. It remains unknown how root cells are primed for incorporation into the developing syncytium. Furthermore, it is an enigma how CLAVATA3/ESR (CLE) peptide effectors secreted into the cytoplasm of the initial feeding cell could have an effect on plant cells so distant from where the nematode is feeding as the syncytium expands. Here we describe a novel translocation signal within nematode CLE effectors that is recognized by plant cell secretory machinery to redirect these peptides from the cytoplasm to the apoplast of plant cells. We show that the translocation signal is functionally conserved across CLE effectors identified in nematode species spanning three genera and multiple plant species, operative across plant cell types, and can traffic other unrelated small peptides from the cytoplasm to the apoplast of host cells via a previously unknown post‐translational mechanism of ER translocation. Our results uncover an unprecedented mechanism of effector trafficking by any plant pathogen to date and illustrates how phytonematodes can deliver effector proteins into host cells and then hijack plant cellular processes for their export back out of the cell to function as external signaling molecules to distant cells

    Protein interaction network of alternatively spliced isoforms from brain links genetic risk factors for autism

    Get PDF
    Increased risk for autism spectrum disorders (ASD) is attributed to hundreds of genetic loci. The convergence of ASD variants have been investigated using various approaches, including protein interactions extracted from the published literature. However, these datasets are frequently incomplete, carry biases and are limited to interactions of a single splicing isoform, which may not be expressed in the disease-relevant tissue. Here we introduce a new interactome mapping approach by experimentally identifying interactions between brain-expressed alternatively spliced variants of ASD risk factors. The Autism Spliceform Interaction Network reveals that almost half of the detected interactions and about 30% of the newly identified interacting partners represent contribution from splicing variants, emphasizing the importance of isoform networks. Isoform interactions greatly contribute to establishing direct physical connections between proteins from the de novo autism CNVs. Our findings demonstrate the critical role of spliceform networks for translating genetic knowledge into a better understanding of human diseases

    Structural Modeling of Protein Interactions by Analogy: Application to PSD-95

    Get PDF
    We describe comparative patch analysis for modeling the structures of multidomain proteins and protein complexes, and apply it to the PSD-95 protein. Comparative patch analysis is a hybrid of comparative modeling based on a template complex and protein docking, with a greater applicability than comparative modeling and a higher accuracy than docking. It relies on structurally defined interactions of each of the complex components, or their homologs, with any other protein, irrespective of its fold. For each component, its known binding modes with other proteins of any fold are collected and expanded by the known binding modes of its homologs. These modes are then used to restrain conventional molecular docking, resulting in a set of binary domain complexes that are subsequently ranked by geometric complementarity and a statistical potential. The method is evaluated by predicting 20 binary complexes of known structure. It is able to correctly identify the binding mode in 70% of the benchmark complexes compared with 30% for protein docking. We applied comparative patch analysis to model the complex of the third PSD-95, DLG, and ZO-1 (PDZ) domain and the SH3-GK domains in the PSD-95 protein, whose structure is unknown. In the first predicted configuration of the domains, PDZ interacts with SH3, leaving both the GMP-binding site of guanylate kinase (GK) and the C-terminus binding cleft of PDZ accessible, while in the second configuration PDZ interacts with GK, burying both binding sites. We suggest that the two alternate configurations correspond to the different functional forms of PSD-95 and provide a possible structural description for the experimentally observed cooperative folding transitions in PSD-95 and its homologs. More generally, we expect that comparative patch analysis will provide useful spatial restraints for the structural characterization of an increasing number of binary and higher-order protein complexes

    Structural Similarity and Classification of Protein Interaction Interfaces

    Get PDF
    Interactions between proteins play a key role in many cellular processes. Studying protein-protein interactions that share similar interaction interfaces may shed light on their evolution and could be helpful in elucidating the mechanisms behind stability and dynamics of the protein complexes. When two complexes share structurally similar subunits, the similarity of the interaction interfaces can be found through a structural superposition of the subunits. However, an accurate detection of similarity between the protein complexes containing subunits of unrelated structure remains an open problem
    corecore